node and edge
DMol: AHighly Efficient and Chemical Motif-Preserving Molecule Generation Platform
We introduce a new graph diffusion model for small drug molecule generation which simultaneously offers a 10-fold reduction in the number of diffusion steps when compared to existing methods, preservation of small molecule graph motifs via motif compression, and an average 3% improvement in SMILES validity over the DiGress model across all real-world molecule benchmarking datasets. Furthermore, our approach outperforms the state-of-the-art DeFoG method with respect to motif-conservation by roughly 4%, as evidenced by high ChEMBLlikeness, QED and newly introduced shingles distance scores. The key ideas behind the approach are to use a combination of deterministic and random subgraph perturbations, so that the node and edge noise schedules are codependent; to modify the loss function of the training process in order to exploit the deterministic component of the schedule; and, to "compress" a collection of highly relevant carbon ring and other motif structures into supernodes in a way that allows for simple subsequent integration into the molecular scaffold1.
Topology-aware Graph Diffusion Model with Persistent Homology
Generating realistic graphs faces challenges in estimating accurate distribution of graphs in an embedding space while preserving structural characteristics. However, existing graph generation methods primarily focus on approximating the joint distribution of nodes and edges, often overlooking topological properties such as connected components and loops, hindering accurate representation of global structures. To address this issue, we propose a Topology-Aware diffusion-based Graph Generation (TAGG), which aims to sample synthetic graphs that closely resemble the structural characteristics of the original graph based on persistent homology. Specifically, we suggest two core components: 1) Persistence Diagram Matching (PDM) loss which ensures high topological fidelity of generated graphs, and 2) Topology-aware Attention Module (TAM) which induces the denoising network to capture the homological characteristics of the original graphs. Extensive experiments on conventional graph benchmarks demonstrate the effectiveness of our approach indicating high generation performance across various metrics, while achieving closer alignment with the distribution of topological features observed in the original graphs.
DTGB: A Comprehensive Benchmark for Dynamic Text-Attributed Graphs
Dynamic text-attributed graphs (DyTAGs) are prevalent in various real-world scenarios, where each node and edge are associated with text descriptions, and both the graph structure and text descriptions evolve over time. Despite their broad applicability, there is a notable scarcity of benchmark datasets tailored to DyTAGs, which hinders the potential advancement in many research fields. To address this gap, we introduce Dynamic Text-attributed Graph Benchmark (DTGB), a collection of large-scale, time-evolving graphs from diverse domains, with nodes and edges enriched by dynamically changing text attributes and categories. To facilitate the use of DTGB, we design standardized evaluation procedures based on four real-world use cases: future link prediction, destination node retrieval, edge classification, and textual relation generation. These tasks require models to understand both dynamic graph structures and natural language, highlighting the unique challenges posed by DyTAGs.
Graph Edit Distance with General Costs Using Neural Set Divergence
Graph Edit Distance (GED) measures the (dis-)similarity between two given graphs in terms of the minimum-cost edit sequence, which transforms one graph to the other.GED is related to other notions of graph similarity, such as graph and subgraph isomorphism, maximum common subgraph, etc. However, the computation of exact GED is NP-Hard, which has recently motivated the design of neural models for GED estimation.However, they do not explicitly account for edit operations with different costs. In response, we propose $\texttt{GraphEdX}$, a neural GED estimator that can work with general costs specified for the four edit operations, viz., edge deletion, edge addition, node deletion, and node addition.We first present GED as a quadratic assignment problem (QAP) that incorporates these four costs.Then, we represent each graph as a set of node and edge embeddings and use them to design a family of neural set divergence surrogates. We replace the QAP terms corresponding to each operation with their surrogates. Computing such neural set divergence requires aligning nodes and edges of the two graphs.We learn these alignments using a Gumbel-Sinkhorn permutation generator, additionally ensuring that the node and edge alignments are consistent with each other. Moreover, these alignments are cognizant of both the presence and absence of edges between node pairs.Through extensive experiments on several datasets, along with a variety of edit cost settings, we show that $\texttt{GraphEdX}$ consistently outperforms state-of-the-art methods and heuristics in terms of prediction error.
PARD: Permutation-invariantAutoregressiveDiffusion forGraphGeneration
Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order,PARD generates a graph in a block-by-block, autoregressivefashion, where each block'sprobability isconditionally modeled by a shared diffusion model with an equivariant network.